A simple try assisted with GPT : nonblocking gather/scatter exchanges by mystic-qaq · Pull Request #7396 · deepmodeling/abacus-develop

mystic-qaq · 2026-05-29T10:55:32Z

Copilot

Pull request overview

Refactors PW_Basis::gatherp_scatters and PW_Basis::gathers_scatterp from blocking MPI_Alltoallv to non-blocking MPI_Irecv/MPI_Isend exchanges with overlapping pack/unpack work, adds a per-instance reusable communication workspace, and introduces a round-trip unit test.

Changes:

Replace MPI_Alltoallv with non-blocking sends/receives plus MPI_Waitsome-driven unpack overlap in both gather/scatter directions, separating send/receive into distinct workspace slices.
Add acquire_comm_workbuf<T>() returning per-instance mutable std::vector storage (float and double specializations) and add fine-grained timer regions.
Add test_comm_roundtrip.cpp (round-trip equality and a zero-plane "stress" layout sweep) and register it in the test CMakeLists.txt.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File	Description
source/source_basis/module_pw/pw_gatherscatter.h	Rewrites both routines to use Irecv/Isend with manual self-copy, dedicated send/recv workspace, and overlapped unpack via `MPI_Waitsome`.
source/source_basis/module_pw/pw_basis.h	Declares `acquire_comm_workbuf` plus mutable per-instance buffers; adds `<vector>` include.
source/source_basis/module_pw/test/test_comm_roundtrip.cpp	New round-trip tests using a friend accessor subclass to call the protected gather/scatter methods.
source/source_basis/module_pw/test/CMakeLists.txt	Registers the new test source file.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mohanchen · 2026-05-30T23:38:04Z

  std::string precision = "double"; ///< single, double, mixing
  bool double_data_ = true;         ///<  if has double data
  bool float_data_ = false;         ///< if has float data
+  mutable std::vector<std::complex<float>> comm_workbuf_float_;


Not recommended to use mutable keyword. It breaks const semantics, hides state changes
and brings potential thread-safety risks. Use it only as a last resort.

…rk buffers Remove the mutable keyword from comm_workbuf_float_ and comm_workbuf_double_ by switching from std::vector (which returns const T* from const data()) to std::unique_ptr<T[]> (whose get() returns T* from const method). Key changes: - Pre-allocate work buffers in allocate_comm_buffers() called from getstartgr(), using the already-computed numr/startr/numg/startg arrays to determine the maximum required buffer size - acquire_comm_workbuf<T>() no longer resizes lazily; it returns the pre-allocated buffer via unique_ptr::get() with an assertion guard - Add cleanup in destructor via unique_ptr::reset() Rationale: unique_ptr::get() is a const method that returns a non-const T*, matching the semantic intent — a const PW_Basis does not re-seat the buffer pointer, but the pointed-to scratch memory remains mutable for MPI write operations. This avoids the thread-safety concerns of mutable while maintaining const-correctness throughout the gather/scatter call chain. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Include standalone microbenchmark (bench_comm.cpp) comparing blocking vs nonblocking MPI gather/scatter, and PR_DESCRIPTION.md with design rationale and performance validation results. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Replace the simplified microbenchmark with a benchmark that directly calls PW_Basis::gatherp_scatters()/gathers_scatterp() (feat/unblock) and compares against the exact blocking implementations from the develop branch. Uses realistic ABACUS parameters (10A cell, ecut=100Ry, 64^3 FFT grid). Key results: nonblocking is 1.06x-1.45x faster at 3+ MPI ranks, with maximum speedup of 1.45x at 4 ranks with 2 OpenMP threads. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

mystic-qaq added 2 commits May 29, 2026 18:17

feat(pw): harden nonblocking gather/scatter exchanges

21f2211

Merge branch 'develop' into feat/unblock

04b7149

mohanchen added Refactor Refactor ABACUS codes Tests/Examples Issues/PR related to unit tests and integrate tests labels May 29, 2026

Cstandardlib requested a review from Copilot May 29, 2026 14:01

Copilot started reviewing on behalf of Cstandardlib May 29, 2026 14:01 View session

Copilot AI reviewed May 29, 2026

View reviewed changes

mohanchen added project_learning and removed Refactor Refactor ABACUS codes labels May 29, 2026

Merge branch 'deepmodeling:develop' into feat/unblock

04103f8

mohanchen reviewed May 30, 2026

View reviewed changes

mystic-qaq and others added 6 commits May 31, 2026 16:21

chore: remove simplified microbenchmark, replaced by bench_real_comm.cpp

b2c3a8a

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

chore: remove PR description, benchmark, and stub files from PR branch

7b4c624

test(pw): avoid copying noncopyable PW_Basis big fixtures

a15f687

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A simple try assisted with GPT : nonblocking gather/scatter exchanges#7396

A simple try assisted with GPT : nonblocking gather/scatter exchanges#7396
mystic-qaq wants to merge 9 commits into
deepmodeling:developfrom
mystic-qaq:feat/unblock

mystic-qaq commented May 29, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

mohanchen May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mystic-qaq commented May 29, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

mohanchen May 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants